Skip to content

Add more property-caching optimizations to x509 Rust backend#14441

Open
abbra wants to merge 10 commits intopyca:mainfrom
abbra:abbra-p-f
Open

Add more property-caching optimizations to x509 Rust backend#14441
abbra wants to merge 10 commits intopyca:mainfrom
abbra:abbra-p-f

Conversation

@abbra
Copy link
Contributor

@abbra abbra commented Mar 8, 2026

PyCA x509 Rust backend — caching optimizations

I was working on my general purpose ASN.1 library and when creating Python bindings, I tested against PyCA code. Some operations in PyCA were slow compared to my code so I wanted to look into what could be improved. With the help of Claude Code I've got some repeatable patterns improved using the same approach PyCA already had in place for some properties.

Below is a report Claude created.

Background

The x509 Rust backend (src/rust/src/x509/) converts parsed ASN.1 data into Python objects on every property access. Operations like name parsing (parse_name), public-key loading, OID conversion, and serial-number iteration are not cheap: each one allocates Python objects, traverses ASN.1 sequences, and crosses the Rust/Python FFI boundary. In workloads that touch the same property more than once on the same object (chain building, path validation, CRL checking, OCSP processing) this cost is paid repeatedly and unnecessarily.

The existing mitigation is pyo3::sync::PyOnceLock<pyo3::Py<pyo3::PyAny>>: a thread-safe write-once cell that stores the Python object after the first computation. It was already used for extension lists everywhere. The work described here extends that pattern to the remaining uncached properties.

Caching pattern

Every cached getter follows the same idiom:

// struct field
cached_foo: pyo3::sync::PyOnceLock<pyo3::Py<pyo3::PyAny>>,

// getter
fn foo<'p>(&self, py: Python<'p>) -> PyResult<Bound<'p, PyAny>> {
    Ok(self.cached_foo
        .get_or_try_init(py, || expensive_computation(py).map(|v| v.unbind()))?
        .bind(py)
        .clone())
}

get_or_try_init is a no-op on every call after the first; the atomic check costs ~50 ns, the cached result is returned without any allocation.

What was implemented

Ten changes were made across five files, committed individually on the performance-improvements branch.

Commit File Change
e7cc638 csr.rs Cache CertificateSigningRequest.attributes
75451dc ocsp_req.rs Cache OCSPRequest issuer_name_hash, issuer_key_hash, hash_algorithm, serial_number
1830ca3 certificate.rs, pkcs7.rs, ocsp_resp.rs Cache Certificate issuer, subject, public_key, signature_algorithm_oid, signature_hash_algorithm
e90f4e5 ocsp_resp.rs Fix O(n²) certificates iteration; cache the resulting list
986298b ocsp_resp.rs Add OCSPSingleResponse.extensions getter with caching
f18d144 crl.rs Cache CRL issuer, signature_algorithm_oid, signature_hash_algorithm
d149aaf crl.rs Replace get_revoked_certificate_by_serial_number linear scan with O(1) HashMap

The OCSPResponse.certificates getter additionally had a documented O(n²) bug (each certificate extracted via clone().nth(i) restarted the iterator). It was replaced with a single linear pass using asn1::write_single to produce independent DER bytes for each certificate, eliminating the need for the map_arc_data_ocsp_response unsafe helper.

Benchmark results

Benchmarks measure repeated access on a single pre-loaded object — the workload that caching is designed to accelerate. Each benchmark creates the object once outside the timed loop, then calls the getter in a tight loop.

Comparison: main (baseline) vs abbra-p-f (PR), both built with maturin develop --release, Python 3.14, OpenSSL 3.5.

Benchmark Baseline (median) PR (median) Speedup
certificate_subject 7141 ns 56 ns 99% faster
certificate_issuer 5626 ns 56 ns 99% faster
crl_issuer 5460 ns 56 ns 99% faster
certificate_public_key 1361 ns 55 ns 96% faster
ocsp_request_properties 1525 ns 119 ns 92% faster
crl_serial_number_lookup_miss 2159 ns 224 ns 90% faster
certificate_signature_hash_algorithm 190 ns 56 ns 71% faster
crl_serial_number_lookup_hit 448 ns 295 ns 34% faster
certificate_signature_algorithm_oid 109 ns 56 ns 49% faster
ocsp_response_properties 1190 ns 1150 ns ~3% (noise)

The subject/issuer/CRL-issuer gains are ~100× because parse_name is the most expensive operation — it constructs a full Python Name object tree from ASN.1 on every call. The cached path costs only an atomic load plus a Python reference clone (~50 ns regardless of name complexity).

crl_serial_number_lookup_hit is 34% faster rather than near-zero because get_revoked_certificate_by_serial_number must still construct a new RevokedCertificate Python object on each hit (the HashMap stores OwnedRevokedCertificate values that are cloned per call). The miss path (90% faster) avoids iterating the whole list and drops from O(n) to O(1).

ocsp_response_properties shows no meaningful change because the properties benchmarked there (issuer_key_hash, serial_number, signature_hash_algorithm on the response-level object) were already relatively cheap and the test exercises only a few iterations of the warm path.

Why the existing load benchmarks showed no improvement

test_load_der_certificate and test_load_pem_certificate each call x509.load_der_x509_certificate(bytes) per iteration, creating a fresh object with empty caches each time. The cache is always cold; caching adds zero benefit and a tiny overhead (extra PyOnceLock::new() fields). These benchmarks measure parsing throughput, not property-access throughput, so they are unaffected by this work.

Benchmark reproduction

uv venv /tmp/bench-venv --python python3.14
uv pip install --python /tmp/bench-venv/bin/python \
    maturin pytest pytest-benchmark certifi setuptools cffi
uv pip install --python /tmp/bench-venv/bin/python -e vectors/

# baseline (main branch)
git checkout main
cp tests/bench/test_x509.py /tmp/bench_test.py   # copy new benchmarks over
VIRTUAL_ENV=/tmp/bench-venv maturin develop --release
/tmp/bench-venv/bin/python -m pytest tests/bench/test_x509.py \
    -k "subject or issuer or public_key or signature or crl_serial or ocsp" \
    --benchmark-json=/tmp/bench_base.json --benchmark-enable \
    --benchmark-warmup=on --benchmark-min-rounds=200 -q

# PR branch
git checkout abbra-p-f
VIRTUAL_ENV=/tmp/bench-venv maturin develop --release
/tmp/bench-venv/bin/python -m pytest tests/bench/test_x509.py \
    -k "subject or issuer or public_key or signature or crl_serial or ocsp" \
    --benchmark-json=/tmp/bench_pr.json --benchmark-enable \
    --benchmark-warmup=on --benchmark-min-rounds=200 -q

python3 .github/bin/compare_benchmarks.py /tmp/bench_base.json /tmp/bench_pr.json

abbra and others added 9 commits March 8, 2026 13:24
The existing load benchmarks create a fresh object each iteration, so
the cache is always cold and caching optimisations show no benefit there.
Add benchmarks that construct the object once and then repeatedly call
the getter, exercising the warm-cache path:

  Certificate : subject, issuer, public_key(),
                signature_hash_algorithm, signature_algorithm_oid
  CRL         : issuer, serial-number lookup (hit and miss)
  OCSPRequest : issuer_name_hash, issuer_key_hash,
                hash_algorithm, serial_number (all in one bench)
  OCSPResponse: issuer_key_hash, serial_number,
                signature_hash_algorithm (all in one bench)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>
…ect caching

The test assumed cert.subject re-parses the Name on every call, so it
checked each too-long-country warning in its own pytest.warns block.
After subject caching, parse_name runs only once (on the first access)
and emits both COUNTRY_NAME and JURISDICTION_COUNTRY_NAME warnings in a
single call. Subsequent accesses return the cached Name object without
re-parsing, so the second block saw no warnings.

Merge both assertions into a single pytest.warns block, which correctly
captures all warnings emitted during the first (and only) parse.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>
Wrap the attributes getter in PyOnceLock so the expensive loop over
ASN.1 attributes (OID conversion, PyBytes allocation, Attributes
construction) runs at most once per CertificateSigningRequest object.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>
Wrap issuer_name_hash, issuer_key_hash, hash_algorithm, and
serial_number getters in PyOnceLock so the allocations (PyBytes
construction, integer conversion, hash-object instantiation) happen
at most once per OCSPRequest object.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>
…gorithm getter results

Wrap the five most-frequently-accessed computed properties in PyOnceLock
so the underlying work (name parsing, public-key loading, OID conversion,
hash-algorithm object construction) runs at most once per Certificate
object regardless of how many times callers read the attribute.

Also update all Certificate struct construction sites (pkcs7.rs,
ocsp_resp.rs) to initialise the new cache fields.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>
The old implementation used index-based .nth(i) over a freshly-cloned
iterator per certificate, making the total work O(n²) in the number of
embedded certs. Also, each call rebuilt the Python list from scratch.

Replace with a single linear pass using asn1::write_single to obtain
independent DER bytes for each certificate (avoiding the need for the
unsafe map_arc_data_ocsp_response helper), then wrap the built PyList
in a PyOnceLock so subsequent calls return the cached object.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>
OCSPSingleResponse lacked an extensions getter entirely. Add one backed
by a PyOnceLock so the extension-parsing work runs at most once per
response object. Handles SCT and CRL entry extensions via the shared
parse_and_cache_extensions helper.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>
Wrap the issuer, signature_algorithm_oid, and signature_hash_algorithm
getters in PyOnceLock so name parsing and OID/hash-object construction
each run at most once per CertificateRevocationList object.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>
get_revoked_certificate_by_serial_number previously iterated over every
revoked certificate on each call (O(n)). Build a HashMap<Vec<u8>,
OwnedRevokedCertificate> on first call using the existing iterator
infrastructure, then answer subsequent lookups in O(1).

Also removes the now-unused try_map_crl_to_revoked_cert unsafe helper.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>
…icates caching

OCSPSingleResponse.extensions was added in commit 986298b but had no
tests. Add four tests in TestOCSPResponse:

* test_single_response_extensions_empty – a typical response with no
  per-SingleResponse extensions returns an empty Extensions object and
  the result is the same cached object on repeated access.

* test_single_response_extensions_sct – resp-sct-extension.der carries
  an SCT list in the raw_single_extensions field; verify it is exposed
  via the new getter on the OCSPSingleResponse iterator item.

* test_single_response_extensions_reason – resp-single-extension-reason.der
  carries a CRLReason; verify it surfaces correctly.

* test_certificates_cached – OCSPResponse.certificates is cached behind a
  PyOnceLock; verify that two successive accesses return the identical
  Python list object (is-identity check).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Alexander Bokovoy <abokovoy@redhat.com>
@alex
Copy link
Member

alex commented Mar 8, 2026

Thanks for submitting this -- for ease of review, can you split this into a few smaller PRs? My suggsetion would be to start with splitting out:

  1. The subject/issuer properties
  2. the public key properties

and we can go from there. Thanks

@abbra
Copy link
Contributor Author

abbra commented Mar 8, 2026

@alex thanks, I opened #14442 for the first one. Since all other PRs would depend on the previous ones being merged, should I wait with the remaining ones?

@reaperhulk
Copy link
Member

Yeah since GH hasn’t shipped dependent PRs yet you should just submit one and once it merges rebase and submit the next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants